---
title: Composed Agents
description: Combine grounding models with any LLM for computer-use capabilities
---

Composed agents combine the best of both worlds: specialized grounding models for precise click prediction and powerful LLMs for task planning and reasoning.

Use the format `"grounding_model+planning_model"` to create a composed agent with any vision-enabled LiteLLM-compatible model.

## How Composed Agents Work

1. **Planning Phase**: The planning model (LLM) analyzes the task and decides what actions to take (e.g., `click("find the login button")`, `type("username")`)
2. **Grounding Phase**: The grounding model converts element descriptions to precise coordinates
3. **Execution**: Actions are performed using the predicted coordinates

## Supported Grounding Models

Any model that supports `predict_click()` can be used as the grounding component. See the full list on [Grounding Models](./grounding-models).

- OpenCUA: `huggingface-local/xlangai/OpenCUA-{7B,32B}`
- GTA1 family: `huggingface-local/HelloKKMe/GTA1-{7B,32B,72B}`
- Holo 1.5 family: `huggingface-local/Hcompany/Holo1.5-{3B,7B,72B}`
- InternVL 3.5 family: `huggingface-local/OpenGVLab/InternVL3_5-{1B,2B,4B,8B,...}`
- UI‑TARS 1.5: `huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B` (also supports full CU)
- OmniParser (OCR): `omniparser` (requires combination with a LiteLLM vision model)
- Moondream3: `moondream3` (requires combination with a LiteLLM vision/text model)

## Supported Planning Models

Any vision-enabled LiteLLM-compatible model can be used as the planning component:

- Any All‑in‑one CUA (planning-capable). See [All‑in‑one CUAs](./computer-use-agents).
- Any VLM via LiteLLM providers: `anthropic/*`, `openai/*`, `openrouter/*`, `gemini/*`, `vertex_ai/*`, `huggingface-local/*`, `mlx/*`, etc.
- Examples:
  - **Anthropic**: `anthropic/claude-3-5-sonnet-20241022`, `anthropic/claude-opus-4-1-20250805`
  - **OpenAI**: `openai/gpt-5`, `openai/gpt-o3`, `openai/gpt-4o`
  - **Google**: `gemini/gemini-1.5-pro`, `vertex_ai/gemini-pro-vision`
  - **Local models**: Any Hugging Face vision-language model

## Usage Examples

### GTA1 + GPT-5

Use OpenAI's GPT-5 for planning with specialized grounding:

```python
agent = ComputerAgent(
    "huggingface-local/HelloKKMe/GTA1-7B+openai/gpt-5",
    tools=[computer]
)

async for _ in agent.run("Take a screenshot, analyze the UI, and click on the most prominent button"):
    pass
```

### GTA1 + Claude 3.5 Sonnet

Combine state-of-the-art grounding with powerful reasoning:

```python
agent = ComputerAgent(
    "huggingface-local/HelloKKMe/GTA1-7B+anthropic/claude-3-5-sonnet-20241022",
    tools=[computer]
)

async for _ in agent.run("Open Firefox, navigate to github.com, and search for 'computer-use'"):
    pass
# Success! 🎉
# - Claude 3.5 Sonnet plans the sequence of actions
# - GTA1-7B provides precise click coordinates for each UI element
```

### UI-TARS + GPT-4o

Combine two different vision models for enhanced capabilities:

```python
agent = ComputerAgent(
    "huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B+openai/gpt-4o",
    tools=[computer]
)

async for _ in agent.run("Help me fill out this form with my personal information"):
    pass
```

### Moondream3 + GPT-4o

Use the built-in Moondream3 grounding with any planning model. Moondream3 will detect UI elements on the latest screenshot, label them, and provide a user message listing detected element names.

```python
from agent import ComputerAgent
from computer import computer

agent = ComputerAgent(
    "moondream3+openai/gpt-4o",
    tools=[computer]
)

async for _ in agent.run("Close the settings window, then open the Downloads folder"):
    pass
```

## Benefits of Composed Agents

- **Specialized Grounding**: Use models optimized for click prediction accuracy
- **Flexible Planning**: Choose any LLM for task reasoning and planning
- **Cost Optimization**: Use smaller grounding models with larger planning models only when needed
- **Performance**: Leverage the strengths of different model architectures

## Capabilities

Composed agents support both capabilities:

```python
agent = ComputerAgent("huggingface-local/HelloKKMe/GTA1-7B+anthropic/claude-3-5-sonnet-20241022")

# Full computer-use agent capabilities
async for _ in agent.run("Complete this online form"):
    pass

# Direct click prediction (uses grounding model only)
coords = agent.predict_click("find the submit button")
```

---

For more information on individual model capabilities, see [Computer-Use Agents](./computer-use-agents) and [Grounding Models](./grounding-models).
